How can I explore high-dimensional data?
نویسندگان
چکیده
High-dimensional data hinder sample visualiza-tion and limit exploration of data 1. In these cases, we can make use of multivariate analysis techniques, such as Factor Analysis (FA) and/or Principal Component Analysis (PCA), to reduce a complex data set to one of lower dimensions so as to reveal any hidden features and simplify understanding. In order to render interpretation of FA and PCA easier, an example of the practical applications of these techniques is described herein. An orthodontist wants to make a few changes in his clinic in order to ensure higher-quality treatment. However, such changes have to be tailored according to the needs and desires of his target customers. Thus, he decided to implement a questionnaire at the end of orthodontic treatment. Patients had to respond to several questions which were grouped into the following items: 1-staff helpfulness; 2-staff professionalism; 3-staff manners; 4-attention given by the dentist; 5-dentist's technical quality; 6-waiting time; 7-explanation about treatment; 8-comfortable facilities; 9-waiting time to schedule an appointment; 10-convenience of treatment schedule; 11-parking facilities; 12-telephone service; 13-cleanliness. When rating, patients used a score that ranged between 1 and 7, with 1 meaning weak and 7 excellent. A total of 50 patients responded to the questionnaire. Afterwards, the dentist found it difficult to establish a change plan due to the large number of items analyzed. In an attempt to reduce the amount of data and facilitate interpretation, the dentist used Factor Analysis (FA), a type of multivariate analysis. Factor Analysis is useful when there is a large number of variables that may provide redundant or duplicated information. In this case, redundancy means that some variables are correlated to each other — possibly because they are measuring the same " thing ". Thus, it is possible to reduce the observed variables to a smaller number also known as factors — groups of correlated variables that present some information in common. The orthodontist found out that the 13 items in the questionnaire were not in fact measuring 13 different facts due to the apparent redundancy shared by some variables. Thus, four new variables or factors, which were implicit in the correlations, were established: 1-efficiency; 2-comfort; 3-staff-patient relationship; 4-dentist-patient relationship. Factor analysis for the variables used by the orthodontist are observed in Figure 1. Nevertheless, to establish an improvement plan, the orthodontist needed to know the item which most influences patients' satisfaction. To this end, he used
منابع مشابه
Feature Selection and Classification of Microarray Gene Expression Data of Ovarian Carcinoma Patients using Weighted Voting Support Vector Machine
We can reach by DNA microarray gene expression to such wealth of information with thousands of variables (genes). Analysis of this information can show genetic reasons of disease and tumor differences. In this study we try to reduce high-dimensional data by statistical method to select valuable genes with high impact as biomarkers and then classify ovarian tumor based on gene expression data of...
متن کاملFinding Corners
Many important image cues such as 'T'-,'X'and 'L'junctions have a local two-dimensional structure. Conventional edge detectors are designed for one-dimensional 'events'. Even the best edge operators can not reliably detect these two-dimensional features. This contribution proposes a solution to the two-dimensional problem. In this paper, I address the following: • 'L'-junction detection. Previo...
متن کاملRobust high-dimensional semiparametric regression using optimized differencing method applied to the vitamin B2 production data
Background and purpose: By evolving science, knowledge, and technology, we deal with high-dimensional data in which the number of predictors may considerably exceed the sample size. The main problems with high-dimensional data are the estimation of the coefficients and interpretation. For high-dimension problems, classical methods are not reliable because of a large number of predictor variable...
متن کاملSimulation of Smoke Emission from Fires in High-Rise Buildings Using the 3D Model Generated from 2-Dimensional Cadastral Data
Having a 3-Dimensional model of high-rise buildings can be used in disaster management such as fire cases to reduce casualties. The fundamental dilemma in 3D building modeling is the unavailability of suitable data sources. However, available cadastral 2D maps could be used as low-cost and attainable resources for 3D building modeling. Smoke will be a great threat to people's health during a f...
متن کاملFeature Selection for Small Sample Sets with High Dimensional Data Using Heuristic Hybrid Approach
Feature selection can significantly be decisive when analyzing high dimensional data, especially with a small number of samples. Feature extraction methods do not have decent performance in these conditions. With small sample sets and high dimensional data, exploring a large search space and learning from insufficient samples becomes extremely hard. As a result, neural networks and clustering a...
متن کاملInter-observer agreement between 2-dimensional CT versus 3-dimensional I-Space model in the Diagnosis of Occult Scaphoid Fractures
Background: The I-Space is a radiological imaging system in which Computed Tomography (CT)-scans can be evaluated as a three dimensional hologram. The aim of this study is to analyze the value of virtual reality (I-Space) in diagnosing acute occult scaphoid fractures. Methods: A convenient cohort of 24 patients with a CT-scan from prior studies, without a scaphoid fracture on radiograph, ye...
متن کامل